-
Notifications
You must be signed in to change notification settings - Fork 314
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LIVY-246 Support multiple Spark home in runtime #318
base: master
Are you sure you want to change the base?
Conversation
Codecov Report
@@ Coverage Diff @@
## master #318 +/- ##
============================================
- Coverage 70.46% 67.84% -2.63%
+ Complexity 713 686 -27
============================================
Files 96 96
Lines 5079 5091 +12
Branches 754 756 +2
============================================
- Hits 3579 3454 -125
- Misses 995 1156 +161
+ Partials 505 481 -24
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not the best to review how well this accomplishes it's purpose, but I've given config related feedback where I can
version.map {version => Option(get(s"livy.server.spark-home_$version")) | ||
.orElse(throw new IllegalArgumentException( | ||
s"Spark version: $version is not supported")) | ||
}.getOrElse(Option(get(s"livy.server.spark-home")).orElse(sys.env.get("SPARK_HOME"))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should still use SPARK_HOME
here as before not livy.server.spark-home
directly
conf/livy.conf.template
Outdated
# livy.server.spark-home_1.5.2=<path>/spark-1.5.2 | ||
# livy.server.spark-home_1.6.3=<path>/spark-1.6.3 | ||
# livy.server.spark-home_2.0.1=<path>/spark-2.0.1 | ||
# livy.server.spark-home_2.1.0=<path>/spark-2.1.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the _
should be a -
to match conf style
val builder = new SparkProcessBuilder(livyConf) | ||
val builder = new SparkProcessBuilder(livyConf, request.sparkVersion) | ||
request.sparkVersion.map({ value => | ||
builder.env("SPARK_CONF_DIR", livyConf.sparkHome(request.sparkVersion) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I'm still figuring out the ins and outs of Livy, but I'm not quite sure why SPARK_CONF_DIR
needs to updated here, where is it used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @ajbozarth, if we are not setting up SPARK_CONF_DIR, then if SPARK_CONF_DIR is set on environment already, then the older one gets precedence and hence spark version change might not get effect.
Example on Hortonworks Distribution: in SPARK_CONF_DIR/spark-defaults.conf, then "spark.yarn.jar" is set to "hdfs:///hdp/spark-assembly.1.6.2.jar" and hence if we are not overwriting the SPARK_CONF_DIR, then always this spark-assembly jar will be used for any spark version.
@@ -246,14 +246,20 @@ class LivyConf(loadDefaults: Boolean) extends ClientConf[LivyConf](null) { | |||
def sparkDeployMode(): Option[String] = Option(get(LIVY_SPARK_DEPLOY_MODE)).filterNot(_.isEmpty) | |||
|
|||
/** Return the location of the spark home directory */ | |||
def sparkHome(): Option[String] = Option(get(SPARK_HOME)).orElse(sys.env.get("SPARK_HOME")) | |||
def sparkHome(version: Option[String] = None): Option[String] = { | |||
version.map {version => Option(get(s"livy.server.spark-home_$version")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should abstract out livy.server.spark-home_$version
here into a def like the val's above. Something along the lines of:
def SPARK_HOME_VER(version: String) = Entry(s"livy.server.spark-home-$version", null)
Fixed the changes suggested. Please review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates, added some follow up comments
@@ -245,15 +245,25 @@ class LivyConf(loadDefaults: Boolean) extends ClientConf[LivyConf](null) { | |||
/** Return the spark deploy mode Livy sessions should use. */ | |||
def sparkDeployMode(): Option[String] = Option(get(LIVY_SPARK_DEPLOY_MODE)).filterNot(_.isEmpty) | |||
|
|||
/** Return the spark home version */ | |||
def SPARK_HOME_VER(version: String): Option[String] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking more along the lines of adding
def SPARK_HOME_VER(version: String) = Entry(s"livy.server.spark-home-$version", null)
to the LivyConf object above as a def
similar to the val
's, then doing Option(get(SPARK_HOME_VER(version)))
which you could then do an orElse throw Exception on inside the sparkHome def as before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I declare def SPARK_HOME_VER(version: String) = Entry(s"livy.server.spark-home-$version", null) then I cant pass it in def sparkHome(version: Option[String] = None): Option[String] = {
- version.map {version=>Option(get(SPARK_HOME_VER(version))).orElse(throw ...)} as it is a map and it will not accept null values
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm understanding you correctly, that's where my comment below would fix it. I'll edit the comment below to clarify it.
/** Return the location of the spark home directory */ | ||
def sparkHome(): Option[String] = Option(get(SPARK_HOME)).orElse(sys.env.get("SPARK_HOME")) | ||
def sparkHome(version: Option[String] = None): Option[String] = { | ||
version.map {version => SPARK_HOME_VER(version) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On a second look it may be better to use a case
here, assuming the SPARK_HOME_VER def I suggested above I recommend above something like:
version.match {
case Some(version) => Option(get(SPARK_HOME_VER(version))).orElse(throw ...)
case None => Option(get(SPARK_HOME)).orElse(sys.env.get("SPARK_HOME"))
}
where the None case is just the old code. This would be easier to read overall.
Fixed the suggested changes. Please review |
Moved the version with rest of the Entry's and also added code to set up SPARK_CONF_DIR. We need to set up SPARK_CONF_DIR otherwise sparksubmit will take the default environment set up SPAR_CONF_DIR - so setting it up according to spark version. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fixes. I found one nit and made a comment on the SPARK_CONF_DIR code you added (I understand why it's needed now)
conf/livy-env.sh.template
Outdated
@@ -1,4 +1,4 @@ | |||
#!/usr/bin/env bash | |||
#!/usr/bin/env bash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this was added by accident
@@ -62,7 +63,11 @@ object BatchSession { | |||
request.conf, request.jars, request.files, request.archives, request.pyFiles, livyConf)) | |||
require(request.file != null, "File is required.") | |||
|
|||
val builder = new SparkProcessBuilder(livyConf) | |||
val builder = new SparkProcessBuilder(livyConf, request.sparkVersion) | |||
request.sparkVersion.map({ value => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need maps here, this should work fine as:
val sparkConfDir = livyConf.sparkHome(request.sparkVersion).map(_ + File.separator + "conf")
builder.env("SPARK_CONF_DIR", sparkConfDir)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we will need map in builder.env as sparkConfDir is optional variable and builder.env declaration needs a string. Agreed we can remove the first map of request.sparkVersion. Will make the changes and deploy.
Fixed the suggested changes in SPARK_CONF_DIR. Please review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
What's difference compared to #232 ? |
@ayushiagarwal want to reopen this on apache/incubator-livy and address @jerryshao concern with his similar PR? |
@ayushiagarwal @ajbozarth |
If @ayushiagarwal lets this continue to sit idle for another week I think it would be ok to reopen this on apache/incubator-livy yourself @karthikknnatarajan |
Sounds good. I will drop a note here before I create a new PR on apache/incubator-livy. |
Made code changes for enabling multiple spark versions on Livy for spark batch jobs. User can pass sparkVersion on run time for batch jobs and SPARK_HOME and SPARK_CONF_DIR would be set according to given sparkVersion.
Need to set path for livy.server.spark-home_$version in Livy.conf and those versions which are added would be supported.
Assumption: